Using quick transcriptions to improve conversational speech models

نویسندگان

  • Owen Kimball
  • Chia-Lin Kao
  • Rukmini Iyer
  • Teodoro Arvizo
  • John Makhoul
چکیده

Using large amounts of training data may prove to be critical to attaining very low error rates in conversational speech recognition. Recent collection efforts by the LDC[1] have produced a large corpus of such data, but to be useful, it must be transcribed. Historically, the cost of transcribing conversational speech has been very high, leading us to consider quick transcription methods that are significantly faster and less expensive than traditional methods. We describe the conventions used in transcription and an automatic utterance segmentation algorithm that provides necessary timing information. Experiments with models trained on a 20-hour set demonstrate that quick transcription works as well as careful transcription, even though the quick transcripts are produced roughly eight times as fast. We also show that when added to a large corpus of carefully transcribed data, quickly transcribed data gives significant improvements in a state-of-the-art ASR system.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Pronunciation variant analysis using speaking style parallel corpus

To improve the recognition accuracy for spontaneous conversational speech, we collected a corpus to study how spontaneous conversational speech differs from read style speech. The corpus consists of two parts: 1) spontaneous conversational speech and 2) read speech with the same word transcriptions as the conversational speech. In word and phone recognition experiments, it was confirmed that, f...

متن کامل

Training topic classifiers for conversational speech with limited data

In this paper we demonstrate how automatically generated transcriptions can be used to develop an effective topic classification application. Two key contributions of our work are (a) investigating the impact of unsupervised transcriptions on topic classification where the transcription system has been trained with very limited amounts of data, and (b) demonstrating the use of mixture language ...

متن کامل

Lexicon-Free Conversational Speech Recognition with Neural Networks

We present an approach to speech recognition that uses only a neural network to map acoustic input to characters, a character-level language model, and a beam search decoding procedure. This approach eliminates much of the complex infrastructure of modern speech recognition systems, making it possible to directly train a speech recognizer using errors generated by spoken language understanding ...

متن کامل

Resegmentation of SWITCHBOARD

The SWITCHBOARD (SWB) corpus is one of the most important benchmarks for recognition tasks involving large vocabulary conversational speech (LVCSR). The high error rates on SWB are largely attributable to an acoustic model mismatch, the high frequency of poorly articulated monosyllabic words, and large variations in pronunciations. It is imperative to improve the quality of segmentations and tr...

متن کامل

Adding Sentence Boundaries to Conversational Speech Transcriptions using Noisily Labelled Examples

This paper presents a technique for adding sentence boundaries to text obtained by Automatic Speech Recognition (ASR) of conversational speech audio. We show that starting with imprecise boundary information added by using only silence information from an ASR system, we can improve boundary detection using head and tail phrases. The main purpose for the insertion of sentence boundaries to ASR c...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2004